feat(plugins-google): add cached_content option for explicit context caching by kamil-bidus · Pull Request #5675 · livekit/agents

kamil-bidus · 2026-05-07T15:10:13Z

Motivation

The Gemini plugin's LLM class supports many GenerateContentConfig options (thinking_config, retrieval_config, safety_settings, etc.) but not cached_content. The plugin already reads cached_content_token_count from response usage in LLMStream._parse_part, so cache hits surface in metrics — there's just no way to attach a CachedContent resource to outgoing requests.

Change

Add cached_content: NotGivenOr[str] = NOT_GIVEN to LLM.__init__, propagated through _LLMOptions → chat() → extra["cached_content"] → GenerateContentConfig via **self._extra_kwargs.

Request-side suppression

Gemini's API rejects generateContent requests that pass cached_content together with system_instruction, tools, or tool_config — those fields belong inside the CachedContent resource itself, and the API returns a 400 instructing callers to move them.

Without handling that, the parameter would 400 on any realistic agent. So LLMStream._run strips system_instruction, tools, and tool_config from the outgoing request whenever cached_content is attached. Behaviour is unchanged when cached_content is unset.

Cache lifecycle (creation, TTL refresh, deletion) and the choice of what to bake into the cache stay the application's responsibility.

Compatibility

Default NOT_GIVEN keeps existing behaviour unchanged — verified by tests covering both the omission case (no key in _extra_kwargs) and the no-cache request path (system_instruction and tools propagate as before).

Works with both Gemini Developer API (cachedContents/{id}) and Vertex AI (projects/{p}/locations/{l}/cachedContents/{id}); the plugin passes the string through unmodified.

Tests

tests/test_plugin_google_llm.py — 6 cases:

Propagation (3) — cached_content round-trips through _LLMOptions and reaches _extra_kwargs; default NOT_GIVEN produces no key.
Suppression (3) — patching client.aio.models.generate_content_stream to capture the GenerateContentConfig, the request omits system_instruction / tools / tool_config when cached_content is set, and includes them when it isn't (backward compat).

Existing google-plugin tests still pass. ruff check / ruff format clean.

Refs

Feature Request: Add Explicit Context Caching Support for Gemini Models #2359 (implicit caching reliability)
https://ai.google.dev/gemini-api/docs/caching

devin-ai-integration

✅ Devin Review: No Issues Found

Devin Review analyzed this PR and found no potential bugs to report.

View in Devin Review to see 3 additional findings.

…caching The plugin currently relies on Gemini's implicit cache, which is heuristic. In voice-agent workloads where the system prompt is large and stable across calls, implicit caching often misses on turn 1 of a conversation, paying the full cold-start cost. Explicit caching is the documented alternative: the application creates a CachedContent resource via client.caches.create(...) and references it by name on subsequent generateContent calls. Cached prefix tokens are billed at a discount and processed in under 100ms. The plugin already reads cached_content_token_count from response usage but had no way to set cached_content on requests. This adds the parameter on LLM.__init__, stores it on _LLMOptions, and propagates it into GenerateContentConfig via extra_kwargs. End-to-end usability matters: Gemini rejects generateContent requests that pass cached_content together with system_instruction, tools, or tool_config — those fields belong inside the CachedContent resource. Without handling that, setting cached_content on any LLM that also has a system prompt or function tools would 400. So LLMStream._run now suppresses system_instruction, tools, and tool_config from the outgoing request whenever cached_content is attached. Cache lifecycle (creation, TTL refresh, deletion) and the choice of what to bake into the cache stay the application's responsibility — the plugin only consumes the resource name and ensures the matching fields are absent from the request. Behaviour is unchanged for callers that don't pass cached_content: the gating is strictly is-given on that one option. Documented on the docstring so users know the cache must contain whichever of system_instruction / tools the model needs. Tests cover propagation, the omitted-when-not-set default, and the three suppression branches (system_instruction stripped, tools stripped, tool_config stripped) plus the unchanged-when-no-cache backward-compat path. Refs livekit#2359.

kamil-bidus mentioned this pull request May 7, 2026

feat(plugins-google): add cached_content option for explicit context caching #5661

Closed

devin-ai-integration Bot reviewed May 7, 2026

View reviewed changes

longcw requested review from a team and tinalenguyen May 8, 2026 00:59

kamil-bidus force-pushed the feat/gemini-cached-content-support branch from c57dd80 to 657894a Compare May 8, 2026 15:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(plugins-google): add cached_content option for explicit context caching#5675

feat(plugins-google): add cached_content option for explicit context caching#5675
kamil-bidus wants to merge 1 commit intolivekit:mainfrom
kamil-bidus:feat/gemini-cached-content-support

kamil-bidus commented May 7, 2026 •

edited

Loading

Uh oh!

devin-ai-integration Bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

kamil-bidus commented May 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Change

Request-side suppression

Compatibility

Tests

Refs

Uh oh!

devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

✅ Devin Review: No Issues Found

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

kamil-bidus commented May 7, 2026 •

edited

Loading